Mining literature for protein-protein interactions
نویسندگان
چکیده
MOTIVATION A central problem in bioinformatics is how to capture information from the vast current scientific literature in a form suitable for analysis by computer. We address the special case of information on protein-protein interactions, and show that the frequencies of words in Medline abstracts can be used to determine whether or not a given paper discusses protein-protein interactions. For those papers determined to discuss this topic, the relevant information can be captured for the Database of Interacting PROTEINS: Furthermore, suitable gene annotations can also be captured. RESULTS Our Bayesian approach scores Medline abstracts for probability of discussing the topic of interest according to the frequencies of discriminating words found in the abstract. More than 80 discriminating words (e.g. complex, interaction, two-hybrid) were determined from a training set of 260 Medline abstracts corresponding to previously validated entries in the Database of Interacting Proteins. Using these words and a log likelihood scoring function, approximately 2000 Medline abstracts were identified as describing interactions between yeast proteins. This approach now forms the basis for the rapid expansion of the Database of Interacting Proteins.
منابع مشابه
Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches
DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...
متن کاملDiscovering Domains Mediating Protein Interactions
Background: Protein-protein interactions do not provide any direct information regarding the domains within the proteins that mediate the interactions. The majority of proteins are multi domain proteins and the interaction between them is often defined by the pairs of their domains. Most of the former studies focus only on interacting domain pairs. However they do not consider the in...
متن کاملA Tree Kernel-Based Method for Protein-Protein Interaction Mining from Biomedical Literature
As genomic research advances, the knowledge discovery from a large collection of scientific papers becomes more important for efficient biological and biomedical research. Even though current databases continue to update new protein-protein interactions, valuable information still remains in biomedical literature. Thus data mining techniques are required to extract the information. In this pape...
متن کاملPubMiner: Machine Learning-Based Text Mining System for Biomedical Information Mining
PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature is introduced. PubMiner utilize natural language processing and machine learning based data mining techniques for mining useful biological information such as protein-protein interaction from the massive literature data. The system recognizes biological terms such as gene, pr...
متن کاملDiscovering Patterns to Extract Protein-Protein Interactions from Full Biomedical Texts
Although there have been many research projects to extract protein pathways, most such information still exists only in the scientific literature, usually written in natural languages and defying data mining efforts. We present a novel and robust approach for extracting protein-protein interactions from the literature. Our method uses a dynamic programming algorithm to compute distinguishing pa...
متن کاملDiscovering patterns to extract protein-protein interactions from full texts
MOTIVATION Although there are several databases storing protein-protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extracting protein pathways from literature. Our aim is to develop a robust and powerful methodology to mi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 17 4 شماره
صفحات -
تاریخ انتشار 2001